Vision processes can recognize patterns {pattern recognition, vision} {shape perception}.
patterns
Patterns have objects, features, and spatial relations. Patterns can have points, lines, angles, waves, histograms, grids, and geometric figures. Objects have brightness, hue, saturation, size, position, and motion.
patterns: context
Pattern surroundings and/or background have brightness, hue, saturation, shape, size, position, and motion.
patterns: movement
Mind recognizes objects with translation-invariant features more easily if they are moving. People can recognize objects that they see moving behind a pinhole.
patterns: music
Mind recognizes music by rhythm or by intonation differences around main note. People can recognize rhythms and rhythmic groups. People can recognize melodies transformed from another melody. People most easily recognize same melody in another key. People easily recognize melodies that exchange high notes for low. People can recognize melodies in reverse. People sometimes recognize melodies with both reverse and exchange.
factors: attention
Pattern recognition depends on alertness and attention.
factors: memory
Recall easiness varies with attention amount, emotion amount, cue availability, and/or previous-occurrence frequency.
animals
Apes recognize objects using fast multisensory processes and slow single-sense processes. Apes do not transfer learning from one sense to another. Frogs can recognize prey and enemy categories [Lettvin et al., 1959]. Bees can recognize colors, except reds, and do circling and wagging dances, which show food-source angle, direction, distance, and amount.
machines
Machines can find, count, and measure picture object areas; classify object shapes; detect colors and textures; and analyze one image, two stereo images, or image sequences. Recognition algorithms have scale invariance.
process levels
Pattern-precognition processing has three levels. Processing depends on effective inputs and useful outputs {computational level, Marr}. Processing uses functions to go from input to output {algorithmic level, Marr}. Processing machinery performs algorithms {physical level, Marr} [Marr, 1982].
neuron pattern recognition
Neuron dendrite and cell-body synapses contribute different potentials to axon initial region. Input distributions represent patterns, such as geometric figures. Different input-potential combinations can trigger neuron impulse. As in statistical mechanics, because synapse number is high, one input-potential distribution has highest probability. Neurons detect that distribution and no other. Learning and memory change cell and affect distribution detected.
Children and adults immediately recognize their images in mirrors {mirror recognition}. Chimpanzees, orangutans, bonobos, and two-year-old humans, but not gorillas, baboons, and monkeys, can recognize themselves in mirrors after using mirrors for a time [Gallup, 1970].
species member
Animals and human infants recognize that their images in mirrors are species members, but they do not recognize themselves. Perhaps, they have no mirror-reflection concept.
movements
Pigeons, monkeys, and apes can use mirrors to guide movements. Some apes can touch body spots that they see in mirrors. Chimpanzees, orangutans, bonobos, and two-year-old humans, but not gorillas, baboons, and monkeys, can use mirror reflections to perceive body parts and to direct actions [Gallup, 1970].
theory of mind
Autistic children use mirrors normally but appear to have no theory of mind. Animals have no theory of mind.
Will a blind person that knows shapes by touch recognize the shapes if able to see {Molyneux problem}? Testing cataract patients after surgery has not yet resolved this question.
Brain has mechanisms to recognize patterns {pattern recognition, methods} {pattern recognition, mechanisms}.
mechanism: association
The first and main pattern-recognition mechanism is association (associative learning). Complex recognition uses multiple associations.
mechanism: feature recognition
Object or event classification involves high-level feature recognition, not direct object or event identification. Brain extracts features and feeds forward to make hypotheses and classifications. For example, people can recognize meaningful facial expressions and other complex perceptions in simple drawings that have key features [Carr and England, 1995].
mechanism: symbol recognition
To recognize letters, on all four sides, check for point, line, corner, convex curve, W or M shape, or S or squiggle shape. 6^4 = 1296 combinations are available. Letters, numbers, and symbols add to less than 130, so symbol recognition is robust [Pao and Ernst, 1982].
mechanism: templates
Templates have non-accidental and signal properties that define object classes. Categories have rules or criteria. Vision uses structural descriptions to recognize patterns. Brains compare input patterns to template using constraint satisfaction on rules or criteria and then selecting best-fitting match, by score. If input activates one representation strongly and inhibits others, representation sends feedback to visual buffer, which then augments input image and modifies or completes input image by altering size, location, or orientation. If representation and image then match even better, mind recognizes object. If not, mind inhibits or ranks that representation and activates next representation.
mechanism: viewpoint
Vision can reconstruct how object appears from any viewpoint using a minimum of two, and a maximum of six, different-viewpoint images. Vision calculates object positions and motions from three views of four non-coplanar points. To recognize objects, vision interpolates between stored representations. Mind recognizes symmetric objects better than asymmetric objects from new viewpoints. Recognition fails for unusual viewpoints.
importance: frequency
For recognition, frequency is more important than recency.
importance: orientation
Recognition processing ignores left-right orientation.
importance: parts
For recognition, parts are more important for nearby objects.
importance: recency
For recognition, frequency is more important than recency.
importance: size
Recognition processing ignores size.
importance: spatial organization
For recognition, spatial organization and overall pattern are more important than parts.
method: averaging
Averaging removes noise by emphasizing low frequencies and minimizing high frequencies.
method: basis functions
HBF or RBF basis functions can separate scene into multiple dimensions.
method: cluster analysis
Pattern recognition can place classes or subsets in clusters in abstract space.
method: feature deconvolution
Cerebral cortex can separate feature from feature mixture.
method: differentiation
Differentiation subtracts second derivative from intensity and emphasizes high frequencies.
method: generalization
Vision generalizes patterns by eliminating one dimension, using one subpattern, or including outer domains.
method: index number
Patterns can have algorithm-generated unique, unambiguous, and meaningful index numbers. Running reverse algorithm generates pattern from index number. Similar patterns have similar index numbers. Patterns differing by subpattern have index numbers that differ only by ratio or difference. Index numbers have information about shape, parts, and relations, not about size, distance, orientation, incident brightness, incident light color, and viewing angle.
Index numbers can be power series. Term coefficients are weights. Term sums are typically unique numbers. For patterns with many points, index number is large, because information is high.
Patterns have a unique point, like gravity center. Pattern points have unique distances from unique point. Power-series terms are for pattern points. Term sums are typically unique numbers that depend only on coordinates internal to pattern. Patterns differing by subpattern differ by ratio or difference.
method: lines
Pattern recognition uses shortest line, extends line, or links lines.
method: intensity
Pattern recognition uses gray-level changes, not colors. Motion detection uses gray-level and pattern changes.
method: invariance
Features can remain invariant as images deform or move. Holding all variables, except one, constant can find the derivative with respect to the non-constant variable, and so calculate partial differentials to measure changes/differences and find invariants.
method: line orientation
Secondary visual cortex neurons can detect line orientation, have large receptive fields, and have variable topographic mapping.
method: linking
Vision can connect pieces in sequence and fill gaps.
method: optimization
Vision can use dynamic programming to optimize parameters.
method: orientation
Vision accurately knows surface tilt and slant, directly, by tilt angle itself, not by angle function [Bhalla and Proffitt, 1999] [Proffitt et al., 1995].
method: probability
Brain uses statistics to assign probability to patterns recognized.
method: registers
Brain-register network can store pattern information, and brain-register network series can store processes and pattern changes.
method: search
Matching can use heuristic search to find feature or path. Low-resolution search over whole image looks for matches to feature templates.
method: separation into parts
Vision can separate scene into additive parts, by boundaries, rather than using basis functions.
method: sketching
Vision uses contrast for boundary making.
To recognize structure, brain can use information about that structure {instructionism, recognition}.
To recognize structure, brain can compare to multiple variations and select best match {selectionism, recognition}, just as cells try many antibodies to bind antigen.
To identify objects, algorithms can test patterns against feature sets. If patterns have features, algorithms add distinctiveness weight to object distinctiveness-weight sum. If object has sum greater than threshold {detection threshold} {threshold of detection}, algorithm identifies pattern as object. Context sets detection threshold.
In recognition algorithms, object features can have weights {distinctiveness weight}, based on how well feature distinguishes object from other objects. Algorithm designers use feature-vs.-weight tables or automatically build tables using experiences.
Sharp brightness or hue difference indicates edge or line {edge detection}. Point clustering indicates edges. Vision uses edge information to make object boundaries and adds information about boundary positions, shapes, directions, and noise. Neuron assemblies have different spatial scales to detect different-size edges and lines. Tracking and linking connect detected edges.
Algorithms {Gabor transform} {Gabor filter} can make series, whose terms are for independent visual features, have constant amplitude, and have functions. Term sums are series [Palmer et al., 1991]. Visual-cortex complex cells act like Gabor filters with power series. Terms have variables raised to powers. Complex-cell types are for specific surface orientation and object size. Gabor-filter complex cells typically make errors for edge gaps, small textures, blurs, and shadows.
Non-parametric algorithms {histogram density estimate} can calculate density. Algorithm tests various cell sizes by nearest-neighbor method or kernel method. Density is average volume per point.
Using Bayesian theory, algorithms {image segmentation} can extend edges to segment image and surround scene regions.
Algorithms {kernel method} can test various cell sizes, to see how small volume must be to have only one point.
Algorithms {linear discriminant function} (Fischer) can find abstract-space hypersurface boundary between space regions (classes), using region averages and covariances.
Algorithms {memory-based models} (MBM) can match input-pattern components to template-pattern components, using weighted sums, to find highest scoring template. Scores are proportional to similarity. Memory-based models uniquely label component differences. Memory-based recognition, sparse-population coding, generalized radial-basis-function (RBF) networks, and hyper-basis-function (HBF) networks are similar algorithms.
Vision can manipulate images to see if two shapes correspond. Vision can zoom, rotate, stretch, color, and split images {mental rotation} [Shepard and Metzler, 1971] [Shepard and Cooper, 1982].
high level
Images transform by high-level perceptual and motor processing, not sense-level processing. Image movements follow abstract-space trajectories or proposition sequence.
motor cortex
Motor processes transform visual mental images, because spatial representations are under motor control [Shiekh, 1983].
time
People require more time to perform mental rotations that are physically awkward. Vision compares aligned images faster than translated, rotated, or inverted images.
Algorithms {nearest neighbor method} can test various cell sizes to see how many points (nearest neighbor) are in cells.
Algorithms {pattern matching} can try to match two network representations by two parallel searches, starting from each representation. Searches look for similar features, components, or relations. When both searches meet, they excite the intermediate point (not necessarily simultaneously), whose signals indicate matching.
Algorithms {pattern theory} can use feedforward and feedback processes and relaxation methods to move from input pattern toward memory pattern. Algorithm uses probabilities, fuzzy sets, and population coding, not formal logic.
For algorithms or observers, graphs {receiver operating characteristics} (ROC) can show true identification-hit rate versus false-hit rate. If correlation line is 45-degree-angle straight line, observer has as many false hits as true hits. If correlation line has steep slope, observer has mostly true hits and few false hits. If correlation line has maximum slope, observer has zero false hits and all true hits.
Vision finds, separates, and labels visual areas by enlarging spatial features or partitioning scenes {region analysis}.
expanding
Progressive entrainment of larger and larger cell populations builds regions using synchronized firing. Regions form by clustering features, smoothing differences, relaxing/optimizing, and extending lines using edge information.
splitting
Regions can form by splitting spatial features or scenes. Parallel circuits break large domains into similar-texture subdomains for texture analysis. Parallel circuits find edge ends by edge interruptions.
For feature detection, brain can use classifying context or constrain classification {relational matching}.
Algorithms {response bias} can use recognition criteria iteratively set by receiver operability curve.
Vision separates scene features into belonging to object and not belonging {segmentation problem}|. Large-scale analysis is first and then local constraints. Context hierarchically divides image into non-interacting parts.
If brain knows reflectance and illumination, shading {shading}| can reveal shape. Line and edge detectors can find shape from shading.
Motion change and retinal disparity are equivalent perceptual problems, so finding distance from retinal disparity and finding shape from motion {shape from motion} changes use equivalent techniques.
Algorithms {signal detection theory} can find patterns in noisy backgrounds. Patterns have stronger signal strength than noise. Detectors have sensitivity and response criteria.
Vision can label vertices as three-intersecting-line combinations {vertex perception}. Intersections can be convex or concave, to right or to left.
Classification algorithms {production system} can use IF/THEN rules on input to conditionally branch to one feature or object. Production systems have three parts: fact database, production rule, and rule-choosing control algorithm.
database
Fact-database entries code for one state {local representation, database}, allowing memory.
rules
Production rules have form "IF State A, THEN Process N". Rules with same IF clause have one precedence order.
controller
Controller checks all rules, performing steps in sequence {serial processing}. For example, if system is in State A and rule starts "IF State A", then controller performs Process N, which uses fact-database data.
states
Discrete systems have state spaces whose axes represent parameters, with possible values. System starts with initial-state parameter settings and moves from state to state, along a trajectory, as controller applies rules.
Production systems have rules {production rule} for moving from one state to the next. Production rules have form "IF State A, THEN Process N". Rules with same IF clause have one precedence order.
Parallel pattern-recognition mechanisms can fire whenever they detect patterns {ACT production system}. Firing puts new data elements in working memory.
Same production can match same data only once {Data Refractoriness production system}.
Production with best-matched IF-clause can have priority {Degree of Match production system}.
Goals are productions put into working memory. Only one goal can be active at a time {Goal Dominance}, so productions whose output matches active goal have priority.
Recently successful productions can have higher strength {Production Strength production system}.
Parallel pattern-recognition mechanisms can fire whenever they detect particular patterns {Soar production system}. Firing puts new data elements in working memory.
If two productions match same data, production with more-specific IF-clause wins {Specificity production system}.
Neuron assemblies can hold essential knowledge about patterns {explicit representation}, using information not in implicit representation. Mind calculates explicit representation from implicit representation, using feature extraction or neural networks [Kobatake et al., 1998] [Logothetis and Pauls, 1995] [Logothetis et al., 1994] [Sheinberg and Logothetis, 2001].
Neuron or pixel sets can hold object image {implicit representation}, with no higher-level knowledge. Implicit representation samples intensities at positions at times, like bitmaps [Kobatake et al., 1998] [Logothetis and Pauls, 1995] [Logothetis et al., 1994] [Sheinberg and Logothetis, 2001].
Algorithms {generalized cone} can describe three-dimensional objects as conical shapes, with axis length/orientation and circle radius/orientation. Main and subsidiary cones can be solid, hollow, inverted, asymmetric, or symmetric. Cone surfaces have patterns and textures [Marr, 1982]. Cone descriptions can use three-dimensional Fourier spherical harmonics, which have volumes, centroids, inertia moments, and inertia products.
Algorithms {generalized cylinder} can describe three-dimensional objects as cylindrical shapes, with axis length/orientation and circle radius/orientation. Main and subsidiary cylinders can be solid, hollow, inverted, asymmetric, or symmetric. Cylindrical surfaces have patterns and textures. Cylinder descriptions can use three-dimensional Fourier spherical harmonics, which have volumes, centroids, inertia moments, and inertia products.
Representations can describe object parts and spatial relations {structural description}. Structure units can be three-dimensional generalized cylinders (Marr), three-dimensional geons (Biederman), or three-dimensional curved solids {superquadratics} (Pentland). Structural descriptions are only good for simple recognition {entry level recognition}, not for superstructures or substructures. Vision uses viewpoint-dependent recognition, not structural descriptions.
Shape representations {template} can hold information for mechanisms to use to replicate or recognize {template theory} {naive template theory}. Template is like memory, and mechanism is like recall. Template can be coded units, shape, image, model, prototype, or pattern. Artificial templates include clay or wax molds. Natural templates are DNA/RNA. Templates can be abstract-space vectors. Using templates requires templates for all viewpoints, and so many templates.
Representations {vector coding} can be sense-receptor intensity patterns and/or brain-structure neuron outputs, which make feature vectors. Vector coding can identify rigid objects in Euclidean space. Vision uses non-metric projective geometry to find invariances by vector analysis [Staudt, 1847] [Veblen and Young, 1918]. Motor-representation middle and lower levels use code that indicates direction and amount.
The feeling of seeing whole scene {scene, vision} {vision, scene} results from maintaining general scene sense in semantic memory, attending repeatedly to scene objects, and forming object patterns. Vision experiences whole scene (perceptual field), not just isolated points, features, surfaces, or objects. Perceptual field provides background and context, which can identify objects and events.
scale
Scenes have different spatial frequencies in different directions and distances. Scenes can have low spatial frequency and seem open. Low-spatial-frequency scenes have more depth, less expansiveness, and less roughness, and are more natural. Scenes can have high spatial frequency and seem closed. High-spatial-frequency scenes have less depth, more expansiveness, and more roughness, and are more about towns.
Scenes have numbers of objects {set size, scene}.
Scenes have patterns or structures of object and object-property placeholders {spatial layout}, such as smooth texture, rough texture, enclosed space, and open space. In spatial layouts, object and property meanings do not matter, only placeholder pattern. Objects and properties can fill object and object property placeholders to supply meaning. Objects have spatial positions, and relations to other objects, that depend on spacing and order. Spatial relations include object and part separations, feature and part conjunctions, movement and orientation directions, and object resolution.
Scenes have homogeneous color and texture regions {visual unit}.
Vision can recognize geometric features {shape, pattern} {pattern, features}.
lines
Shapes have lines, line orientations, and edges. Contour outlines indicate objects and enhance brightness and contrast. Irregular contours and hatching indicate movement. Contrast enhances contours, for example with Mach bands. Contrast differences divide large surfaces into parts.
axes
Shapes have natural position axes, such as vertical and horizontal, and natural shape axes, such as long axis and short axis. Vision uses horizontal, vertical, and radial axes for structure and composition.
relations
Objects are wholes and have parts. Wholes are part integrations or configurations and are about gist. Parts are standard features and are about details.
surfaces
Shape has surfaces, with surface curvatures, orientations, and vertices. Visual system can label lines and surfaces as convex, concave, or overlapping [Grunewald et al., 2002]. Shapes have shape-density functions, with projections onto axes or chords [Grunewald et al., 2002]. Shapes have distances and natural metrics, such as lines between points.
illuminance
Shapes have illuminance and reflectance.
Shapes have axis and chord ratios {area eccentricity} [Grunewald et al., 2002].
Shapes have perimeter squared divided by area {compactness, shape} [Grunewald et al., 2002].
Shapes have minimum chain-code sequences that make shape classes {concavity tree}, which have maximum and minimum concavity-shape numbers [Grunewald et al., 2002].
Shapes have connectedness {Euler number, shape} [Grunewald et al., 2002].
Pattern recognition can use conscious memory {explicit recognition} [McDougall, 1911] [McDougall, 1923].
Pattern recognition can be automatic {implicit recognition} [McDougall, 1911] [McDougall, 1923], like reflexes.
Outline of Knowledge Database Home Page
Description of Outline of Knowledge Database
Date Modified: 2022.0225